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Abstract 

Upon growth on dinitrogen, the filamentous cyanobacterium Nostoc PCC 7120 initiates 
metabohc and morphological changes. We analyzed the expression of 1249 genes from major 
metabolic categories under nitrogen fixing and non-nitrogen fixing growth. The expression 
data were correlated with potential target secondary structures, probe GC-content, predicted 
operon structures, and nitrogen content of gene products. Of the selected genes, 494 show a 
more than 2-fold difference in the two conditions analyzed. Under nitrogen-fixing conditions 
465 genes, mainly involved in energy metabolism, photosynthesis, respiration and nitrogen- 
fixation, were found to be stronger expressed, whereas 29 genes showed a stronger expression 
under non-nitrogen fixing conditions. Analysis of the nitrogen content of regulated genes 
shows that Nostoc PCC 7120 growing on dinitrogen is freed from any constraints to save 
nitrogen. For the first time the expression of high light-induced stress proteins (HLIP-family) 
is shown to be linked to the nitrogen availability. 



Introduction 

Cyanobacteria are intensively used in biotechnology. We are interested in growing cyanobac- 
teria in photobioreactors with the goal of hydrogen production and biomass generation [1,2]. 
Hydrogen metabolism is closely linked to dinitrogen assimilation in nitrogen-fixing bacte- 
ria. Heterocystous nitrogen-fixing cyanobacteria like Nostoc sp. strain PCC 7120 (Nostoc 
PCC 7120) respond to the deprivation of combined nitrogen with morphological changes, 
i.e., the formation of heterocysts. These specialized non-dividing cells develop more or less 
equidistantly along the filaments with a ratio of about one heterocyst per 10 vegetative cells. 
In comparison to vegetative cells, the heterocyst is larger and more rounded. It provides 
an environment with low oxygen partial pressure since it lacks oxygen-evolving photosys- 
tem II activity and has a higher respiration rate [3]. Furthermore, it is surrounded by a 
thick glycolipidic cell wall that reduces the diffusion of oxygen. Heterocysts provide amino 
acids and receive carbohydrates from their neighboring vegetative cells. The differentiation 
process begins within a few hours after combined-nitrogen deprivation, i.e. growth on dini- 
trogen as sole nitrogen source, and requires approximately 24 hours to complete. Although 
the detailed cellular processes underlying heterocyst formation and its regulation are not 
well understood, many single components involved in this process have been described (for 
reviews see: [4-8]). A key protein in heterocyst formation is NtcA, which is the global nitro- 
gen regulator that controls, e.g., the expression of genes essential for heterocyst development 
such as the ABC-type transporter genes devABC and the regulator genes hetR and patS. 
Among many other factors that have been demonstrated to be important for heterocyst de- 
velopment are enzymes that are involved in the formation of heterocyst-specific glycolipids 
and polysaccharides. The driving forces for these structural changes occurring during het- 
erocyst formation are metabolic requirements: the dinitrogen fixing multi-enzyme complex 
nitrogenase requires a low oxygen partial pressure and a high supply of ATP and reduction 



equivalents [9]. ATP may be generated by either cyclic photophosphorylation or oxidative 
phosphorylation while low-potential electrons may be generated from the degradation of 
carbohydrates produced during photosynthesis [10]. Obviously, profound regulatory events 
coincide with growth on dinitrogen. 

A powerful tool to study gene expression and its regulation is the DNA-microarray tech- 
nique. Until now only few approaches about DNA-microarray based gene-expression analyses 
with heterocystous cyanobacteria have been reported. With respect to nitrogen metabolism, 
only one experiment has been described [11,12]. The authors used a segment-based DNA- 
microarray, where each segment covers up to 8 predicted genes [11]. This experimental setup 
does not allow expression analysis of individual genes. 

Previously, we analyzed the expression of individual genes and operons of Nostoc spp. 
that are involved in nitrogen metabolism (for a review see: [1]). Here we describe an oligo- 
nucleotide based DNA-microarray expression analysis, where each gene is covered by up to 
10 unique probes. We employed a novel, recently developed microarray technique where 
probe synthesis, hybridization, and signal detection take place in one device at strongly 
controlled physical conditions [13,14]. The expression data were used to a) validate the 
technique employed and b) obtain a global overview about the effect of growth on dinitrogen 
on Nostoc PCC 7120. In addition, we describe a convenient data-processing pipeline based 
on a MySQL database and a web-based graphical user interface. This front-end allows users 
to overlay gene-expression data on KEGG (Kyoto Encyclopedia of Genes and Genomes) 
pathway maps. 



Material and Methods 

Strains and Culture Conditions 

The cyanobacterium Nostoc sp. strain PCC 7120 (formerly Anabaena sp. strain PCC 7120) 
was grown on either dinitrogen (nitrogen fixing) or combined nitrogen (non-nitrogen fixing) 
in batch cuhures. Non-nitrogen fixing conditions were obtained by growing cells in BGIIq 
supplemented with 5 mM NH4CI and 10 mM HEPES. Nitrogen- fixing conditions were ob- 
tained by growing cells in BGIIq. All cultures were grown and harvested as previously 
described [15, 16]. 

Preparation of Biotin Labeled, Fragmented cRNA 

Isolation of total RNA from Nostoc PCC 7120 was performed as previously described [17]. 
From 10 ^g of total RNA, low molecular weight RNA, e.g., tRNA and 5S rRNA, were 
removed by size exclusion chromatography (MEGAclear kit, Ambion). To remove 16S and 
23S rRNA, the MICROBExpress kit from Ambion was used. The remaining RNA was 
linearly amplified by a modified Eberwine protocol [18] as follows. If not differently stated, 
all enzymes and chemicals were purchased from Invitrogene. First strand synthesis: The 
pelleted RNA from the previous mRNA-enrichment steps was resuspended in 4.25 /il water 
and mixed with 1 /xl of T7 random hexamers (0.5 /ig//il; 5'-GGC GAG TGA ATT GTA 
ATA GGA GTG AGT ATA GGG AGG GGG NNN NNN-3'). Following incubation at 70°C 
for 10 min, 4°C for 2 min and 23°C for 5 min, 3.75 /il reaction mix (2 /il 5x first strand 
synthesis buffer (1 /il 0.1 M DTT, 0.5 /il 10 mM dNTP mix, 0.25 /il 40 U RNase OUT) 
and 200 U Superscript II polymerase) was added to the RNA/primer mix. First strand 
synthesis reaction was performed with the following temperature scheme: 37°C for 20 min, 
42°C for 20 min, 50°C for 15 min, 55°C for 10 min and 65°C for 15 min. After adding 0.5 



/il RNase H the reaction mix was incubated for another 30 min at 37°C and 2 min at 95°C. 
Second strand synthesis: The product of the first strand synthesis was mixed with 43.8 /il 
water and 15 //I 5x second strand synthesis buffer (20 U DNA-polymerase I, 1.5 yul 10 mM 
dNTP and 1 U RnaseH) and incubated for 2 h at 16°C. After addition of 10 U T4 DNA- 
polymerase the reaction mix was first incubated at 16°C for 15 min and then at 70°C for 10 
min. Isolation of ds-cDNA: Double stranded cDNA was isolated from the product of second 
strand synthesis according to standard procedures [19]. In vitro transcription: The pelleted 
ds-cDNA was resuspended in 1.5 fxl water. The MEGAscript T7 kit (Ambion) was used for 
in vitro transcription. In addition to the standard nucleotides, 3.75 fil 10 mM Bio-16-CTP 
(NEN) and 3.75 //l 75 mM Bio-11-UTP (Roche) were added to the reaction mix. This led to 
the formation of biotinylated cRNA. cRNA-isolation: The RNeasy kit (Qiagen) was applied 
for cRNA-isolation. All steps were performed according to the manufacturer's instructions. 
cRN A- fragmentation: For cRNA-fragmentation 15 /ig cRNA was resuspended in 2.5 /il water 
and 2.5 /il 2x fragmentation buffer (5x stock: 200 mM Tris, 150 mM Mg-acetate, 500 mM 
K-acetate, pH 8.1). The reaction mix was incubated for 5 min at 94°C. The fragmentation 
reaction was performed immediately prior to hybridization. 

Oligonucleotide Probe Selection 

A unique Nostoc FCC 7120 probe set (as many 25-mer probes per open reading frame 
(ORE) as possible) was calculated based on the full genome sequence (retrieved online from 
CyanoBase: http : //www . kazusa . or . jp/cyanobase/Anabaena/index . htm l ) using a com- 
bination of sequence uniqueness criteria and rules for selection of oligonucleotides likely to 
hybridize with high specificity and sensitivity. The selection criteria were as described in 
Lockhart et al. [20] with modifications for the longer probes used here (25-mers instead of 
20-mers). If available, 10 unique probes per ORE were used in the experiments. 



DNA-Microarray Production and In Situ Oligonucleotide Synthe- 
sis 

Light-activated in situ oligonucleotide synthesis was performed as described by Singh-Gasson 
et al. [21] using a digital micromirror device, which is part of the geniom one device (febit 
biotec GmbH, Heidelberg/Germany). The synthesis was performed within the geniom 
one device on an activated three-dimensional reaction carrier consisting of a glass-silica- 
glass sandwich (DNA-processor). Four individually accessible microchannels (referred to 
as arrays), etched into the silica layer of the DNA-processor, were connected to the mi- 
crofluidic system of the geniom device. Using standard DNA-synthesis reagents and 3'- 
phosphoramidites with a photolabile protecting group [22,23], oligonucleotides were synthe- 
sized in parallel in all four translucent arrays of one reaction carrier. Prior to synthesis, the 
glass surface was activated by coating with a silane-bound spacer. 

Hybridization 

Non-competitive hybridizations were performed with 7.5 ^g fragmented cRNA (see above) in 
a final volume of 10 /xl. The hybridization solution contained 100 mM MES (pH 6.6), 0.9 M 
NaCl, 20 mM EDTA, 0.01% (v/v) Tween 20, 0.1 mg/ml sonicated herring sperm DNA, and 
0.5 mg/ml BSA. RNA-samples were heated in the hybridization solution to 95°C for 3 min 
followed by 45°C for 3 min before being placed in an array which had been prehybridized for 
15 min with 1% (w/v) BSA in hybridization solution at room temperature. Hybridizations 
were carried out at 45°C for 16 h. After removing the hybridization solutions, arrays were 
first washed with non-stringent buffer (0.005% (v/v) Triton X-100 in 6x SSPE) for 20 min at 
25°C and subsequently with stringent buffer (0.005% (v/v) Triton X-100 in 0.5x SSPE) for 
20 min at 45°C. After washing, the hybridized RNA was fluorescence-stained by incubating 
with 10 /ig/ml streptavidin-phycoerythrin and 2 /ig//il BSA in 6x SSPE at 25°C for 15 min. 
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Unbound streptavidin-phycoerythrin was removed by washing with non-stringent buffer for 
20 min at 25°C. 

Detection and Data Processing 

The CCD-camera based fluorescence detection system, equipped with a Cy3 filter set, in- 
tegrated into the geniom one automate was used. 36 pixels per spot were available for 
data analysis. Processing of raw data, including background correction, array to array nor- 
malization and determination of gene-expression levels, as well as calculation of expression 
differences were performed as described before [24]. All steps were carried out using the 
PROP- algorithm of the geniom application software which is based on the MOlD-algorithm 
described by Zhou and Abagyan [24]. Background correction is based on probes with no 
corresponding mRNA-target and the average of the lowest 5% expressed genes. Data nor- 
malization is based on iteratively correcting the raw data on non-regulated genes. It could 
be shown previously in a comparative study of Saccharomyces cerevisiae gene expression 
with three independent techniques (i.e., Affymetrix GeneChips, geniom one microarrays, 
and cDNA microarrays) that expression differences greater than ±1.5-times are significant 
for the geniom one platform applied here [13]. In our study we extend this range such that 
the upper or lower bound had to be greater than ±2. 

Operon Analysis 

For annotating ORFs belonging to operons, the assignment by Ermolaeva et al. [25] was 
used. Only ORFs belonging to operons with probabilities above 60% were included in the 
analysis. For 51 ORF-pairs the maximum gene-expression difference was calculated (operon 
set). The same number of ORF-pairs was set up by randomly choosing ORFs that do not 
belong to the same operon (non-operon set). For statistical analysis, data from four arrays 



were pooled, yielding 204 data points per set, and subjected to non-parametric significance 
tests (Mann- Whitney- and Kolmogorov-Smirnov test). 

Probe Secondary Structure and GC- Content Analysis 

In order to correlate probe secondary structures and GC-contents with their expression 
value, the secondary structure of each ORF was calculated using the Vienna RNA Package 
Version 1.4 [26]. The probes were aligned to their corresponding ORF and the potential 
probe structure extracted. The number of hybridized (stem duplexes) versus free (loops and 
bulges) probe nucleotides as well as the probe's GC-content were used in further analysis. 

HyDaBa Database 

All gene-expression data obtained are saved in the Hydrogenase Database (HyDaBa, |http : //hydaba . uni-k 
This relational database allows cross-linking of the expression data with the annotated 



genome data from NCBI ( http : //www . ncbi . nlm . nih . gov ) and Cyanobase (http : //www . kazusa . or • jp/c 



and pathway maps available from KEGG (http://www.genome.jp/kegg I. The latter is 



achieved in real-time via a SOAP-interface. HyDaBa is based on a Apache Webserver 



( http : //www . apache . org I , MySQL database ( http : //www . mysql . com ) and a front-end 
programmed in PHP ( http : //www . php . net ) . All data are publicly accessible via this web 
interface. 

Results and Discussion 



In the present study we analyzed the expression of 1249 selected genes from 16 metabolic 
categories (ca. 20% of the complete genome) of Nostoc PCC 7120 cultures under nitrogen 
fixing and non-nitrogen fixing conditions (Tab. 1). Therefore we applied a DNA-microarray 



based approach (Fig. 1). 

Preparation of the DNA-Processor 

Oligonucleotide synthesis, hybridization with target cRNA, and signal detection were per- 
formed with one single device, named geniom one (febit biotech GmbH, Heidelberg/ Germany, 
see [13]). 25-mer oligonucleotide probes were synthesized in situ on the DNA-microarray sur- 
face. In order to obtain a broad picture of gene-expression differences between nitrogen fixing 
and non-nitrogen fixing Nostoc PCC 7120 cultures, 500 manually and 749 randomly selected 
target genes from all major metabolic categories were analyzed (Tab. 1; [27]). This selection 
was based on the genome sequence and annotation available from the CyanoBase consor- 



tium ( http : //w ww . kazusa ■ or . j p/ cyanobase/Anabaena /index . html I . In order to ensure 



reproducibility of the microarray analysis, up to 10 unique 25-mer oligonucleotide probes per 
target ORF were distributed randomly over the DNA-processor. Due to their small size, 132 
ORFs were represented by less than 10 unique probes. Of theses, 78 represent unknown and 
15 hypothetical proteins, respectively. Of the remaining only 5 ORFs were represented by 
less than 4 unique probes. Among those is the ORF encoding the heterocyst differentiation 
related protein PatN (alr4812). 

General Data Analysis 

Figure 2 shows a section of four arrays used in this analysis. Due to in situ probe synthesis 
with a digital micromirror device both the spot morphology and topology are extremely 
homogeneous. The use of one physical surface for all arrays and the fixed placement of 
the slide during all processing steps results in very low experimental variation. In order to 
visualize the signal-to-noise ratio the fluorescence-intensity ratio of either two RNA-samples 
from different growth conditions or from two RNA-samples from the same growth condition 
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are plotted in double-logarithmic scales (Fig. 3). It can be clearly seen that the comparison 
of two different metabolic states scatters much broader than the self-to-self comparison. The 
variance of the data is displayed by their respective Pearson correlation value r^. Five ORFs 
in the self-to-self comparison show an unexpected large variation. In three cases (asr7152, 
air 7535 and air 7580) this can be explained by their low, close to threshold fluorescence-signal 
intensity. 

Operon Analysis 

In order to obtain a different insight into the quality of the obtained expression data, we 
exploit the fact that the majority of genes within an operon should exhibit similar expression 
levels. Therefore, we compared the expression-level differences of ORF-pairs that either 
belong to one single operon or not (Fig. 4). The grouping of ORF-pairs into either the operon 
set or the non-operon set was based on data published by Ermolaeva et al. [25]. Their main 
criterion to assign operons is conserved ORF-neighborhood over a broad range of microbes. 
As expected, the expression difference between ORFs not belonging to the same operon are 
significant larger (p<0.001) than between ORF-pairs belonging to one single operon. 

GC- Content and Probe Secondary Structure Analysis 

It is often observed in oligonucleotide-based DNA-microarray experiments that probes di- 
rected against one single transcript show large hybridization level variations. The reason 
for this fluctuation remains still unknown and is one major reason for the necessity to cal- 
culate the expression value for each transcript from several unique probes. We analyzed 
the influence of both GC-content and secondary structure formations on the hybridization 
signal. 

Figure 5A shows the correlation between the number of guanine and cytosine nucleotides 
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(GC-content) in the 25-mere probes and the corresponding hybridization signaL There is 
no probe with less than 3 or more than 17 GCs. In the range between 7 and 13 GCs there 
is a clear linear correlation between GC-content and hybridization signals. To many (more 
than 50%) or to few (less than 25%) GCs in the probe result in non-linear behavior. As 
can be seen from the number of observations given in the plot, less than 0.7% (328 out 
of 48208) of all unique probes are affected by this non-linear behavior. At the current 
stage it is not possible to draw conclusions from the extremes on both sides of Figure 5A 
because they are only represented by few data. In contrary to the GC-content, the predicted 
secondary structure (stems, loops and bulges) of the transcript has only little influence on 
the hybridization signal (Fig. 5B). This was expected because (a) the target was chopped 
into smaller fragments prior to hybridization, and (b) the hybridization conditions are set 
such that no secondary structures should form in either the probe or the target. 

The effect of individual probe hybridization signals is usually ignored in oligonucleotide- 
based DNA-microarray experiments. Instead, the average over all probes is used for each 
transcript. However, these effects have to be taken into account if only few probes are 
available for particular transcripts. Furthermore, we can conclude that a large portion of 
the hybridization signal variation is intrinsic to the probe sequence and can not be explained 
by currently known DNA-duplex formation physico-chemistry. 

Data Processing and Visualization 

DNA-microarray experiments involve accumulation and management of large amounts of 
data. Apart from the experimental data, information from open access knowledge databases 
and sequence analysis are collected. To provide optimal accessibility to all data we set up 
a MySQL database on an Apache driven Internet server. The database holds both raw 
and processed data. Besides data management the database allows cross-connectivity of 
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expresseion data with annotations from NCBI database, Cyanobase and KEGG (Fig. 6). In 
order to access and query the database (which has been coined HyDaBa) a PHP-based and 
Internet-accessible front-end has been developed. This front-end helps to query the data, 
guides the user to define and store new queries, allows data up- and download, and can be 
easily extended due to its modular setup with template pages. The most important feature 
of HyDaBa constitutes the mapping of gene-expression data onto metabolic charts from the 
KEGG database (Fig. 6). Technically, this has been achieved by using a SOAP interface [28]. 
Equally important is the possibility to query for all data available for a given ORF. HyDaBa 



can be accessed at http://hydaba.uni-koeln.de 



Global Differences in Gene Expression upon Growth on Dinitrogen 

Growth on dinitrogen as sole nitrogen source acts like a positive transcriptional switch in 
Nostoc PCC 7120. There is a much larger fraction of genes stronger expressed under nitrogen 
fixing than under non-nitrogen fixing conditions and only a minority of genes shows a de- 
creased expression level. Only 17 annotated and 12 hypothetical ORFs exhibit a significant 
higher expression under non-nitrogen fixing conditions (Tab. 2) whereas 281 annotated and 
184 hypothetical ORFs are stronger expressed under nitrogen fixing conditions. In Figure 
7, these gene-expression differences are clustered according to the participation of the cor- 
responding ORF in specific metabolic categories. The strongest expressed genes participate 
in photosynthesis and respiration (K). Closer analysis reveals that 21 of the 29 strongest 
expressed genes in this group belong to photosynthesis, 11 of which are structural proteins 
of phycobilisomes (Tab. 3). These findings clearly illustrate the extensive energy demand 
for nitrogen fixation. Obviously, the cell expands its light harvesting complexes in order to 
direct more light energy to the photosystems and produce both more ATP and NADPH. 
The stronger expression of proteins involved in respiration underlines previous findings that 
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the respiration rate is increased under nitrogen fixing conditions in cyanobacteria. It is be- 
lieved that this process supports the removal of oxygen which otherwise would inactivate the 
nitrogenase enzyme complex. 

Nitrogen Content of Gene Products 

It has been shown previously for yeast and E. coli that enzymes involved in the assimilation 
of sulfur, nitrogen, or carbon are depleted in that respective atom [29]. Furthermore, upon 
sulfur depletion, yeast spares sulfur rich proteins by either reducing their expression level 
or by expressing isoforms that are poor in sulfur [30]. We are interested in the global 
responds of Nostoc PCC 7120 to the depletion of both inorganic and organic nitrogen in 
the culture medium. To address this question we tested if cells grown on dinitrogen as sole 
nitrogen source tend to spare nitrogen. This should be reflected in the nitrogen content 
of the gene products. As a rough approximation we assume that the expression level for a 
particular gene is proportional to the amount of protein the gene is encoding for. Then we can 
analyze whether the proteome shows differences with respect to the nitrogen content upon 
growth on dinitrogen as nitrogen source. Figure 8 shows the difference of expressed nitrogen 
atoms (number of nitrogen atoms per transcript times expression value) under nitrogen fixing 
and non-nitrogen fixing conditions. We observed a significant difference between these two 
conditions. Obviously, Nostoc PCC 7120 does not save but spend nitrogen under nitrogen 
fixing conditions (Fig. 8). We conclude that the cells are freed from any limitations of 
nitrogen utilization once the nitrogenase enzyme complex is expressed and active. This has 
major implication for the utilization of cyanobacteria for, e.g., hydrogen gas production [2]. 
Although depletion of inorganic nitrogen is prerequisite for nitrogenase based production of 
hydrogen gas, the cells are not facing stress from nitrogen limitation. Thus, the biomass 
obtained from such photobioreactors will yield a rich food source for, e.g., livestock farming. 



14 



Heterocyst Related Genes 

As a key global regulator, NtcA plays an important role in the expression of many genes 
involved in heterocyst differentiation and nitrogen assimilation. For the unicellular, non- 
differentiating cyanobacterium Synechococcus PCC 7942 it has been shown that the binding 
affinity of NtcA to its target DNA-sequence is elevated by 2-oxoglutarate [31,32]. Thus, 
2-oxoglutarate exerts a direct role on NtcA-mediated transcription activation. Furthermore, 
it plays a central role in sensing the nitrogen status, or rather the C/N-balance and acts as a 
substrate in glutamate synthesis, which in turn is one of the first metabolic steps of ammo- 
nium assimilation (Fig. 9). We found key enzymes catalyzing the synthesis of 2-oxoglutarate, 
aconitase hydratase (2.8-times; EC 4.2.1.3) and isocitrate dehydrogenase (4.7-times; EC 
1.1.1.42), respectively, being stronger expressed under nitrogen fixing condition (Fig. 9). 
The hetC gene (alr2817), which encodes a putative ABC-transporter that is essential for 
heterocyst formation, has been shown to be a direct target of the transcriptional regulator 
NtcA [33]. Indeed we see a 5.8-time stronger expression under nitrogen fixing conditions. 
Table 4 shows expression differences for all known ORFs involved in heterocyst formation 
included in this study. 

Nitrogen Metabolism Related Genes 

The conversion of dinitrogen to ammonia, catalyzed by the nitrogenase enzyme complex, 
is only the first step in a series of reactions that make nitrogen available to the cell. The 
nitrogenase enzyme complex provides two products that are metabolized, hydrogen gas and 
ammonia, respectively. The former is taken up by an uptake hydrogenase while the latter 
is incorporated to glutamate by the glutamine synthase yielding glutamine. Figure 9 gives 
an overview over the main pathways and enzyme complexes involved in nitrogen fixation. 
The nitrogenase consists of three subunits, the molybdenum-iron protein alpha chain (NifD, 
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alll454), the molybdenum-iron protein beta chain (NifK, alll440), and the iron protein 
(NifH, alll455). We found nifK and nifH to be more than 10-times stronger expressed 
under nitrogen fixing conditions. One ORF {nifH2, alr0874) of yet unknown function that 
is paralogous (92% similarity, 86% identity) to the iron protein {nifH, alll455) was found 
to be expressed at a very lower level and is only slightly stronger expressed under nitrogen 
fixing conditions (Fig. 9). Thus, the gene product of mfH2 is probably not involved in the 
nitrogen fixation reaction. The glutamine synthase (glutamate-ammonia ligase) is encoded 
by glriA (alr2328). Northern blot studies in Nostoc PCC 7120 have shown that the glnA 
transcript is present in both nitrogen-fixing as well as non-nitrogen fixing cultures, but more 
abundant in the latter [34]. This is in accordance with our results. 

Other Genes 

Phycobilisomes 

Phycobilisomes are the major light-harvesting complexes of cyanobacteria. These are multi- 
protein assemblies that are functionally associated with photosystem II and constitute up to 
50% of the total cellular protein. It has been shown previously that phycobilisomes serve as 
a nitrogen storage. Upon nitrogen starvation they can be completely degraded within two 
days. Phycobilisome degradation is thought to provide substrates for protein biosynthesis. 
As discussed above, we observed strong expression of major components involved in photo- 
synthesis under nitrogen fixing conditions. Accordingly, all proteins included in this analysis 
and that constitute phycobilisomes are around 3-times stronger expressed under nitrogen 
fixing conditions. 
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Respiratory Terminal Oxidases 

Nostoc PCC 7120 possesses three cytochrome c oxidase gene clusters, coxl (alr0950, alr0951, 
alr0952), cox2 (alr2514, alr2515, ah2516), and cox3 (alr2729, alr2730, alr2731, alr2732, 
alr2734), respectively [35]. While coxl and cox2 are homologous to aaa-type cytochrome 
c oxidases [36,37], cox3 is most similar to alternative respiratory terminal oxidases [35]. The 
expression of cox2 and cox3 has been reported to be restricted to heterocysts [35] . In accor- 
dance to this result we see no difference in the expression of coxl, while cox2 and cox3 are 
more than 4-times stronger expressed under nitrogen fixing conditions (see Table 5). 

Adaptations and Atypical Conditions 

One group of genes that is up to 11-times stronger expressed under non-nitrogen fixing 
conditions belong to the high light-induced proteins (HLIP-family; Tab. 6) [38,39]. These 
proteins belong to the CAB/ELlP/HLlP-superfamily and are evolutionary related to each 
other [38]. While CAB (chlorophyll a/b-binding) proteins are major constitutions of the light 
harvesting complexes, ELIPs (early light-induced proteins) and HLIPs are taking over photo- 
protective functions. The HLIP-family in pro- and eukaryotic photosynthetic organisms 
consists of more than 100 different stress proteins which have one membrane spanning alpha 
helix. They accumulate only transiently in photosynthetic membranes in response to light 
stress and have photoprotective functions. At the amino acid level, members of the HLIP- 
family are closely related to light-harvesting chlorophyll a/b-binding antenna proteins of 
photosystem 1 and 11, present in higher plants and some algae. Despite this similarity it 
is believed that HLlP-proteins fulfill their photoprotective role by either transient binding 
of free chlorophyll molecules or by participating in energy dissipation [39]. Photo oxidative 
stress in not necessarily connected to high light fluxes but can also be caused by nutrient 
deprivation that ultimately lead to oversaturation of the photosynthetic electron transport 
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chain. At this point one can only speculate why the HLIP-family is stronger expressed 
under non-nitrogen fixing conditions. Since the same light and temperature settings were 
employed for both growing conditions, one argument would be that more light is required 
(and thus "consumed") under nitrogen fixing conditions. Indeed the nitrogenase enzyme 
complex has immense energy and reducing power demands. In concert with the higher 
availability of nitrogen upon growth on dinitrogen (see above) we conclude that this growth 
condition frees Nostoc PCC 7120 from stress. To our knowledge this is the first report about 
differential expression of members of the HLIP-family in cyanobacteria upon combined- 
nitrogen deprivation. Analysis of the location of the HLIP-family members shown in Table 
6 reveals no link to known ORFs that are involved in nitrogen metabolism. Only asl0449 is 
located immediately downstream of the allophycocyanin alpha subunit {apcA, all0450) and 
thus demonstrates its potential functional relation to photosynthesis. Future work might 
help to uncover the function of these proteins in pro- and eukaroytes. 
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Tables 



Table 1 - Analyzed ORFs. Number of ORFs analyzed and present in each metabolic cate- 
gory of Nostoc PCC 7120. 

Metabolic Class Number of ORFs analyzed 

A Amino acid biosyntliesis 

B Biosyntliesis of cofactors, prosthetic groups and carriers 

C Cell envelope 

D Cellular processes 

E Central intermediary metabolism 

F DNA-replication, recombination, and repair 

G Energy metabolism 

H Fatty acid, phospholipid and sterol metabolism 

I Hypothetical 

J Other categories 

K Photosynthesis and respiration 

L Purines, pyrimidines, nucleosides, and nucleotides 

M Regulatory functions 

N Transcription 

O Translation 

P Transport and binding proteins 
Total 1249 of 6135 (20%) 



64 


of 


113 


(57%) 


14 


of 


152 


(9%) 


15 


of 


77 


(19%) 


47 


of 


96 


(49%) 


40 


of 


72 


(56%) 


29 


of 


105 


(28%) 


20 


of 


100 


(20%) 


2 


of 


40 


(5%) 


567 


of 


3573 


(16%) 


204 


of 


678 


(30%) 


98 


of 


157 


(62%) 


2 


of 


58 


(3%) 


36 


of 


360 


(10%) 


31 


of 


41 


(76%) 


13 


of 


200 


(7%) 


67 


of 


313 


(21%) 
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Table 2 - Highly Expressed ORFs Under Non-Nitrogen Fixing Conditions. Annotated 
ORFs that show a more than 2-times higher expression level under non-nitrogen fixing than 
under nitrogen fixing conditions (the values are negative because nitrogen fixation was taken 
as reference). All ORFs annotated as putative, hypothetical, or unknown have been omitted. 
The key for metabolic classes can be found in Table 1. 



ORF Class SubmetabolismAnnotation 



Expr. DifT. 



all0410 (A) Aromatic tryptophan synthase beta subunit TrpB 

amino acid 

family 

air 7354 (B) Thioredoxin, glutathione S-transferase 

glutaredoxin, 
and glu- 

tathione 

all0166 (E) Polysaccharides alpha, alpha-trehalase 
and glycopro- 
teins 

asl0873 (J) Adaptations CAB/ELIP/HLIP superfamily 

and atypical 
conditions 



alpha-amylase 

glycerophosphoryl diester phosphodiesterase 

maltooligosyltrehalose synthase 

probable alpha-glucanotransferase 

putative zinc-binding oxidoreductase 

flavodoxin 



asr3042 




•)•) 


asr3043 




■>■) 


asl0449 




■>■> 


all0168 




Other 


all0275 




57 


all0167 




•)■> 


all0875 




•)■> 


all0412 




■)•) 


alr2405 


(K) 


Soluble elec- 
tron carriers 


all0258 


;? 


;? 


alr0702 


(0) 


Degradation of 
proteins, pep- 
tides, and gly- 
copeptides 


all2674 


(P) 


Protein mod- 
ification and 
translation 
factors 



all0322 



plastocyanin precursor; PetE 
serine proteinase 



ferrichrome-iron receptor 



sulfate-binding protein SbpA 



-2.0 



-5.9 



-6.9 



-4.4 



-fl.3 
-7.9 
-2.3 
-9.5 
-3.4 
-2.3 
-4.4 
-2.2 
-7.6 

-2.3 

-2.8 



-2.1 



-4.0 
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Table 3 - Photosynthesis and Respiration. Highly expressed ORFs involved in photosyn- 
thesis or respiration. Expression levels are at least 30x above background (300). 



ORF Submetabolism Annotation 



+N 



-N 



all0138 


PS II 


all0259 


PS II 


asr3845 


PS II 


asr3846 


PS II 


air 3421 


Cyt. b6/f 



alr3422 Cyt. b6/f 

all0109 PS I 

all0107 PS I 

all4121 Electron Carriers 

all4148 Electron Carriers 

alr0021 Phycobilisome 

alr0022 Phycobilisome 

asr0023 Phycobilisome 

alr0020 Phycobilisome 

alr0529 Phycobilisome 

alr0528 Phycobilisome 

alr0530 Phycobilisome 

alr0532 Phycobilisome 

alr0524 Phycobilisome 

alr0523 Phycobilisome 

alr0525 Phycobilisome 

ani842 NADH DH 

all3840 NADH DH 

ah0225 NADH DH 

ah0226 NADH DH 

ah3956 NADH DH 

allOOOe ATP Synthase 

allOOlO ATP Synthase 

allOOll ATP Synthase 



CP47 protein 

cytochrome c550 

cytochrome b559 alpha-subunit 

cytochrome b559 beta subunit 

plastoquinol-plastocyanin reductase, cytochrome 

b6; PetB 

plastoquinol-plastocyanin reductase, apocy- 

tochrome subunit 4; PetD 

Subunit III precursor; PsaF 

Subunit XI; PsaL 

ferredoxin-NADP(+) reductase 

ferredoxin I 

allophycocyanin alpha subunit; ApcA 

allophycocyanin beta subunit; ApcB 

core linker protein Lc7.8; ApcC 

core-membrane linker protein ApcE 

phycocyanin alpha chain; CpcA 

phycocyanin beta chain; CpcB 

phycocyanin-associated rod linker protein CpcC 

phycocyanobilin lyase alpha subunit; CpcE 

phycoerythrocyanin alpha chain; PecA 

phycoerythrocyanin beta chain; PecB 

phycoerythrocyanin-associated rod linker protein 

PecC 

NADH dehydrogenase 

Chain J 

Subunit 6; NdhG 

Subunit 4L; NdhE 

Subunit 5 

Subunit delta; AtpD 

Subunit a; Atpl 

Subunit 1; Atpl 



13326 


9658 


9311 


13906 


12527 


10674 


7606 


10848 


16697 


12112 



9381 13981 



19723 


23286 


18416 


25334 


5205 


12290 


10984 


14541 


28417 


24927 


25066 


19930 


13165 


16575 


7285 


18364 


16927 


18047 


20430 


15579 


13993 


11404 


11497 


18861 


11660 


13985 


22153 


16828 


4337 


12756 


2762 


15611 


9877 


14945 


8989 


10480 


9310 


12526 


7635 


12323 


10175 


10859 


15528 


17550 


13805 


13505 
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Table 4 - Heterocysts. ORFs included in the present study that are specific to hete- 
rocysts and show significant expression differences. Expr: expression difference between 
non-nitrogen and nitrogen fixing condition. 



ORF Description 



Expr 



all0521 two-component response regulator, heterocyst pattern formation protein PatA 2.4 

all0813 heterocyst-specific glycolipids-directing protein HglK 4.6 

alll430 heterocyst ferredoxin; FdxH 12.9 

alll431 HesB protein 9.7 

ani432 HesA protein 8.6 

alll730 similar to HetF protein 3.4 

all2512 transcriptional regulator; PatB 2.5 

all5346 heterocyst specific ABC-transporter, membrane spanning subunit DevC homolog 8.9 

all5347 heterocyst specific ABC-transporter, membrane fusion protein DevB homolog 3.0 

alrl603 putative heterocyst to vegetative cell connection protein (fraH) 2.4 

alr2339 heterocyst differentiation protein HetR 2.1 

alr2817 heterocyst differentiation protein HetC 5.8 

alr2818 heterocyst differentiation protein HetF 2.6 

alr2834 glycosyltransferase; hepC 18.5 

alr2835 heterocyst specific ABC-transporter; hepA 5.7 

alr3234 similar to heterocyst formation protein HetF 6.6 

alr3648 heterocyst specific ABC-transporter, membrane spanning subunit DevC homolog 2.7 

alr3649 heterocyst specific ABC-transporter, ATF-binding subunit DevA homolog 3.7 

alr3698 heterocyst envelope polysaccharide synthesis protein HepB 6.5 

alr3710 heterocyst specific ABC-transporter, membrane fusion protein DevB 3.3 

alr3711 heterocyst specific ABC-transporter, membrane spanning subunit DevC 12.4 

alr3712 heterocyst specific ABC-transporter, ATF-binding subunit DevA 13.1 

alr4281 heterocyst specific ABC-transporter, membrane spanning subunit DevC homolog 3.5 

alr4812 heterocyst differentiation related protein FatN 2.3 

alr5355 heterocyst glycolipid synthase; HglC 4.1 

alr5358 ketoacyl reductase; HetN 4.0 
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Table 5 - Respiratory Terminal Oxidases. Expression differences for respiratory terminal 
oxidases under nitrogen fixing conditions. 



ORF Description 



Expr 



C OX 1 



alr0950 cytochrome c oxidase sub unit II (coxB) less than ±2 
alr0951 cytochrome c oxidase sub unit I (coxA) less than ±2 
alr0952 cytochrome c oxidase subunit III {coxC) less than ±2 



C 0X2 



alr2514 cytochrome c oxidase subunit II (coxB) 
alr2515 cytochrome c oxidase subunit I (coxA) 
alr2516 cytochrome c oxidase subunit III (coxC) 



4.7 

7.0 

31.8 



C O X3 



alr2731 


cytochrome c oxidase subunit II (coxB) 


less than ±2 


alr2732 


cytochrome c oxidase subunit I (coxA) 


3.3 


alr2734 


cytochrome c oxidase subunit III {coxC) 


4.7 
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Table 6 - HLIP-family. Expression differences for the HLIP-family involved in adaptations 
and atypical conditions. Negative values indicate stronger expression under non-nitrogen 
fixing conditions. 

ORF Description Expr-Diff 

asl0449 CAB/ELIP/HLIP-related protein -2.3 

asl0514 CAB/ELIP/HLIP-superfamily less than ±2 

asl0873 CAB/ELIP/HLIP-superfamily -4.4 

asl2354 CAB/ELIP/HLIP-related protein less than ±2 

asr3042 CAB/ELIP/HLIP-superfamily of proteins -11.3 

asr3043 CAB/ELIP/HLIP-superfamily of proteins -7.9 

asl3726 CAB/ELIP/HLIP-superfamily less than ±2 

asr5262 CAB/ELIP/HLIP-superfamily of protein less than ±2 
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Figures 



+NH4 



-NH4 




Figure 1 - Experimental Setup. Nostoc PCC 7120 batcli cultures were grown under 
nitrogen fixing (-NH4) or non-nitrogen fixing conditions (+NH4), harvested and 
subjected to total-RNA purification. After removal of rRNA by means of affinity 
chromatography, labeled and fragmented cRNA was hybridized to four individual 
arrays at single physical slides. 
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Figure 2 - DNA-Microarrays. Sections of four arrays used in tlie present analysis. 
The bright spots at the top and button of the arrays are controls used by the spot- 
finding software. (1) non-nitrogen fixing culture (2) non-nitrogen fixing culture, 
biological replicate (3) nitrogen fixing culture (4) nitrogen fixing culture, biological 
replicate. 
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Figure 3 - Data Quality. For tlie double-logaritlimic plots non-normalized raw 
data were used. The left plot shows the gene-expression differences between nitro- 
gen fixing (ordinate) and non-nitrogen fixing, ammonia grown (abscissa) cultures 
of Nostoc PCC 7f20. The right plot visualizes the gene-expression differences be- 
tween two individual nitrogen fixing cultures. Significant outliers are marked by 
arrows. The straight and dashed lines represent 2- and 3-fold expression differences, 
respectively. 
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Figure 4 - Operon Analysis. Depending on wlietlier pairs of ORFs are belonging 
to a single operon or not, they were split into two groups. Then, the expression 
difference of each ORE pair was calculated. The mean of the resulting differences 
in both groups, along with two-times the standard error is plotted. Non-parametric 
significance tests yield p-values below 0.001. 
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Figure 5 - Hybridization Effects. The logarithmized intensity of the hybridization 

signal is plotted (A) against the GC-content of the 25-mere probe sequence, and 
(B) against the number free secondary structure features (bulges and loops) in 
the target sequence. The notches around the medians indicate the 95% confidence 
interval that the median from one box differs from the median of another box, i.e., 
if the notches do not overlap the corresponding medians are significantly different. 
The upper and lower box indicate the second and third quartile, respectively. The 
plot whiskers extending out from the box to the extreme values. The number above 
the upper whisker state the number of observations for the corresponding box. 
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Figure 6 - HyDaBa Database. All experimental gene-expression data are saved in 
a relational database fhttp : //hydaba . uni-koeln . de) . This database allows cross- 
linking of the gene-expression data with genome data from NCBI and Cyanobase 
and metabolic charts available from KEGG. HyDaBa is based on a Apache Web- 
server, a MySQL database and a front-end programmed in PHP. All data are pub- 
licly accessible via this web interface. Upper left: database outline; Lower left: 
screenshot of metabolic categories; Right: screenshot of expression data overlayed 
on metabolic charts. 
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Figure 7 - Global Expression Differences. Measured gene-expression differences 
in major metabolic categories. The lower plot shows the median expression level of 
the analyzed ORFs in the corresponding metabolic class. The upper plot shows the 
corresponding relative change (maximum range: -1 to 1). The numbers above the 
bars represent the number of ORFs analyzed in the corresponding category. The 
legend to the letters denoting metabolic classes can be found in Table 1. 
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Figure 8 - Nitrogen Content. DiflFerence of absolute numbers of expressed nitrogen 
atoms between nitrogen fixing and non-nitrogen fixing conditions. The notches 
around the median indicate the 95% confidence interval that the median from one 
box differs from the median of another box, i.e., if the notches do not overlap the 
corresponding medians are significantly different. The upper and lower box indicate 
the second and third quartile, respectively. The plot whiskers extending out from 
the box to the extreme values. The number above the upper whisker state the 
number of observations for the corresponding box. 
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Figure 9 - Nitrogen Metabolism. Tlie main components involved in nitrogen 
fixation and assimilation include the nitrogenase {nif -genes), the glutamine syn- 
thase (GS, glnA), and glutamine 2-oxoglutarate amidotransferase (GOGAT, gltS). 
2-oxoglutarate is synthezised from oxaloacetate and acetyl-CoA by the incomplete 
TCA-cycle, of which the aconitate hydratase (EC 4.2.1.3) and isocitrate dehydro- 
genase (EC 1.1.1.42) are shown. 
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